Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Correction of Medical Handwriting OCR Based on Semantic Similarity

Identifieur interne : 000E75 ( Main/Exploration ); précédent : 000E74; suivant : 000E76

Correction of Medical Handwriting OCR Based on Semantic Similarity

Auteurs : Bartosz Broda [Pologne] ; Maciej Piasecki [Pologne]

Source :

RBID : ISTEX:DB2FFB6B2516E71CF28B2B430EE662F7753696EE

Abstract

Abstract: In the paper a method of the correction of handwriting Optical Character Recognition (OCR) based on the semantic similarity is presented. Different versions of the extraction of semantic similarity measures from a corpus are analysed, with the best results achieved for the combination of the text window context and Rank Weight Function. An algorithm of the word sequence selection with the high internal similarity is proposed. The method was trained and applied to a corpus of real medical documents written in Polish.

Url:
DOI: 10.1007/978-3-540-77226-2_45


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Correction of Medical Handwriting OCR Based on Semantic Similarity</title>
<author>
<name sortKey="Broda, Bartosz" sort="Broda, Bartosz" uniqKey="Broda B" first="Bartosz" last="Broda">Bartosz Broda</name>
</author>
<author>
<name sortKey="Piasecki, Maciej" sort="Piasecki, Maciej" uniqKey="Piasecki M" first="Maciej" last="Piasecki">Maciej Piasecki</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:DB2FFB6B2516E71CF28B2B430EE662F7753696EE</idno>
<date when="2007" year="2007">2007</date>
<idno type="doi">10.1007/978-3-540-77226-2_45</idno>
<idno type="url">https://api.istex.fr/document/DB2FFB6B2516E71CF28B2B430EE662F7753696EE/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000165</idno>
<idno type="wicri:Area/Istex/Curation">000163</idno>
<idno type="wicri:Area/Istex/Checkpoint">000890</idno>
<idno type="wicri:doubleKey">0302-9743:2007:Broda B:correction:of:medical</idno>
<idno type="wicri:Area/Main/Merge">000E88</idno>
<idno type="wicri:Area/Main/Curation">000E75</idno>
<idno type="wicri:Area/Main/Exploration">000E75</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Correction of Medical Handwriting OCR Based on Semantic Similarity</title>
<author>
<name sortKey="Broda, Bartosz" sort="Broda, Bartosz" uniqKey="Broda B" first="Bartosz" last="Broda">Bartosz Broda</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Informatics, Wrocław University of Technology</wicri:regionArea>
<wicri:noRegion>Wrocław University of Technology</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Pologne</country>
</affiliation>
</author>
<author>
<name sortKey="Piasecki, Maciej" sort="Piasecki, Maciej" uniqKey="Piasecki M" first="Maciej" last="Piasecki">Maciej Piasecki</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Informatics, Wrocław University of Technology</wicri:regionArea>
<wicri:noRegion>Wrocław University of Technology</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Pologne</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2007</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">DB2FFB6B2516E71CF28B2B430EE662F7753696EE</idno>
<idno type="DOI">10.1007/978-3-540-77226-2_45</idno>
<idno type="ChapterID">45</idno>
<idno type="ChapterID">Chap45</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: In the paper a method of the correction of handwriting Optical Character Recognition (OCR) based on the semantic similarity is presented. Different versions of the extraction of semantic similarity measures from a corpus are analysed, with the best results achieved for the combination of the text window context and Rank Weight Function. An algorithm of the word sequence selection with the high internal similarity is proposed. The method was trained and applied to a corpus of real medical documents written in Polish.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Pologne</li>
</country>
</list>
<tree>
<country name="Pologne">
<noRegion>
<name sortKey="Broda, Bartosz" sort="Broda, Bartosz" uniqKey="Broda B" first="Bartosz" last="Broda">Bartosz Broda</name>
</noRegion>
<name sortKey="Broda, Bartosz" sort="Broda, Bartosz" uniqKey="Broda B" first="Bartosz" last="Broda">Bartosz Broda</name>
<name sortKey="Piasecki, Maciej" sort="Piasecki, Maciej" uniqKey="Piasecki M" first="Maciej" last="Piasecki">Maciej Piasecki</name>
<name sortKey="Piasecki, Maciej" sort="Piasecki, Maciej" uniqKey="Piasecki M" first="Maciej" last="Piasecki">Maciej Piasecki</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000E75 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000E75 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:DB2FFB6B2516E71CF28B2B430EE662F7753696EE
   |texte=   Correction of Medical Handwriting OCR Based on Semantic Similarity
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024